Sketching Word Vectors Through Hashing

نویسندگان

  • Behrang Q. Zadeh
  • Laura Kallmeyer
  • Peyman Passban
چکیده

We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results, comparable to neural embedding learning techniques, however, with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method, the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously, this method comes with other known benefits of random projection-based techniques such as ease of update.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2 . 3 Sketching using Locality Sensitive Hashing

In this lecture we will get to know several techniques that can be grouped by the general definition of sketching. When using the sketching technique each element is replaced by a more compact representation of itself. An alternative algorithm is run on the more compact representations. Finally, one has to show that this algorithm gives the same result as the original algorithm with high probab...

متن کامل

FROSH: FasteR Online Sketching Hashing

Many hashing methods, especially those that are in the data-dependent category with good learning accuracy, are still inefficient when dealing with three critical problems in modern data analysis. First, data usually come in a streaming fashion, but most of the existing hashing methods are batch-based models. Second, when data become huge, the extensive computational time, large space requireme...

متن کامل

Hash2Vec, Feature Hashing for Word Embeddings

In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm...

متن کامل

Hash Embeddings for Efficient Word Representations

We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by k d-dimensional embeddings vectors and one k dimensional weight vector. The final d dim...

متن کامل

Predicting IPO Performance from Nearest Neighbors Using TF-IDF Weighted Word Count Vectors

We introduce a novel approach to mining and leveraging data concerning stocks in order to predict the performance of new stocks following their initial public offering, a traditionally difficult task due to the lack of information and historical performance data. We collect a large corpus of articles for every existing stock between March 1st, 2014 and March 1st, 2015. We create weighted featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.04253  شماره 

صفحات  -

تاریخ انتشار 2017